{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1YRGLD1pOOrJ"
      },
      "source": [
        "##### Copyright 2021 The TensorFlow Federated Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "koW3R4ntOgLS"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "grBmytrShbUE"
      },
      "source": [
        "# Random noise generation in TFF\n",
        "\n",
        "This tutorial will discuss the recommended best practices for random noise generation in TFF. Random noise generation is an important component of many privacy protection techniques in federated learning algorithms, e.g., differential privacy."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "coAumH42q9nz"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/federated/tutorials/random_noise_generation\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/federated/blob/master/docs/tutorials/random_noise_generation.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/federated/blob/master/docs/tutorials/random_noise_generation.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/federated/docs/tutorials/random_noise_generation.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yiq_MY4LopET"
      },
      "source": [
        "## Before we begin\n",
        "\n",
        "First, let us make sure the notebook is connected to a backend that has the relevant components compiled. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ke7EyuvG0Zyn"
      },
      "outputs": [],
      "source": [
        "#@test {\"skip\": true}\n",
        "!pip install --quiet --upgrade tensorflow_federated_nightly\n",
        "!pip install --quiet --upgrade nest_asyncio\n",
        "\n",
        "import nest_asyncio\n",
        "nest_asyncio.apply()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rtgStTrNIId-"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "import tensorflow_federated as tff"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "X6eWsahmQpmi"
      },
      "source": [
        "Run the following \"Hello World\"\n",
        "example to make sure the TFF environment is correctly setup. If it doesn't work,\n",
        "please refer to the [Installation](../install.md) guide for instructions."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wjX3wmC-P1aE"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "b'Hello, World!'"
            ]
          },
          "execution_count": 3,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "@tff.federated_computation\n",
        "def hello_world():\n",
        "  return 'Hello, World!'\n",
        "\n",
        "hello_world()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A8xl6I2X9ObS"
      },
      "source": [
        "## Random noise on clients\n",
        "\n",
        "The need for noise on clients generally falls into two cases: identical noise and i.i.d. noise.\n",
        "\n",
        "*   For identical noise, the recommended pattern is to maintain a seed on the server, broadcast it to clients, and use the `tf.random.stateless`\n",
        "functions to generate noise.\n",
        "*   For i.i.d. noise, use a tf.random.Generator initialized on the client with from_non_deterministic_state, in keeping with TF's recommendation to avoid the tf.random.\\<distribution\\> functions.\n",
        "\n",
        "Client behavior is different from server (doesn't suffer from the pitfalls discussed later) because each client will build their own computation graph and initialize their own default seed."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ykw-7vrN_WC8"
      },
      "source": [
        "### Identical noise on clients"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "aZk9h1nb9nLN"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Seed: [1 1]. All clients sampled value    1.665.\n",
            "Seed: [2 2]. All clients sampled value   -0.219.\n"
          ]
        }
      ],
      "source": [
        "# Set to use 10 clients.\n",
        "tff.backends.native.set_local_execution_context(num_clients=10)\n",
        "\n",
        "@tff.tf_computation\n",
        "def noise_from_seed(seed):\n",
        "  return tf.random.stateless_normal((), seed=seed)\n",
        "\n",
        "seed_type_at_server = tff.type_at_server(tff.to_type((tf.int64, [2])))\n",
        "\n",
        "@tff.federated_computation(seed_type_at_server)\n",
        "def get_random_min_and_max_deterministic(seed):\n",
        "  # Broadcast seed to all clients.\n",
        "  seed_on_clients = tff.federated_broadcast(seed)\n",
        "\n",
        "  # Clients generate noise from seed deterministicly.\n",
        "  noise_on_clients = tff.federated_map(noise_from_seed, seed_on_clients)\n",
        "\n",
        "  # Aggregate and return the min and max of the values generated on clients.\n",
        "  min = tff.aggregators.federated_min(noise_on_clients)\n",
        "  max = tff.aggregators.federated_max(noise_on_clients)\n",
        "  return min, max\n",
        "\n",
        "seed = tf.constant([1, 1], dtype=tf.int64)\n",
        "min, max = get_random_min_and_max_deterministic(seed)\n",
        "assert min == max\n",
        "print(f'Seed: {seed.numpy()}. All clients sampled value {min:8.3f}.')\n",
        "\n",
        "seed += 1\n",
        "min, max = get_random_min_and_max_deterministic(seed)\n",
        "assert min == max\n",
        "print(f'Seed: {seed.numpy()}. All clients sampled value {min:8.3f}.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "P_FqzcAGAHq_"
      },
      "source": [
        "### Independent noise on clients"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SNU-ZECrAMiI"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Values differ across clients.   -1.810,   1.079.\n",
            "Values differ across rounds.    -1.205,   0.851.\n"
          ]
        }
      ],
      "source": [
        "@tff.tf_computation\n",
        "def nondeterministic_noise():\n",
        "  gen = tf.random.Generator.from_non_deterministic_state()\n",
        "  return gen.normal(())\n",
        "\n",
        "@tff.federated_computation(seed_type_at_server)\n",
        "def get_random_min_and_max_nondeterministic(seed):\n",
        "  noise_on_clients = tff.federated_eval(nondeterministic_noise, tff.CLIENTS)\n",
        "  min = tff.aggregators.federated_min(noise_on_clients)\n",
        "  max = tff.aggregators.federated_max(noise_on_clients)\n",
        "  return min, max\n",
        "\n",
        "min, max = get_random_min_and_max_nondeterministic(seed)\n",
        "assert min != max\n",
        "print(f'Values differ across clients. {min:8.3f},{max:8.3f}.')\n",
        "\n",
        "new_min, new_max = get_random_min_and_max_nondeterministic(seed)\n",
        "assert new_min != new_max\n",
        "assert new_min != min and new_max != max\n",
        "print(f'Values differ across rounds.  {new_min:8.3f},{new_max:8.3f}.')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mL7ZiI6A_GyX"
      },
      "source": [
        "## Random noise on the server"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "C2-BdlyAId1_"
      },
      "source": [
        "### Discouraged usage: directly using `tf.random.normal`\n",
        "\n",
        "TF1.x like APIs `tf.random.normal` for random noise generation are strongly discouraged in TF2 according to the [random noise generation tutorial in TF](https://www.tensorflow.org/guide/random_numbers). Surprising behavior may happen when these APIs are used together with `tf.function` and `tf.random.set_seed`. For example, the following code will generate the same value with each call. This surprising behavior is expected for TF, and explanation can be found in the [documentation of `tf.random.set_seed`](https://www.tensorflow.org/api_docs/python/tf/random/set_seed). "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0S7t0-3hHCWc"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "0.3052047 0.3052047\n"
          ]
        }
      ],
      "source": [
        "tf.random.set_seed(1)\n",
        " \n",
        "@tf.function\n",
        "def return_one_noise(_):\n",
        "  return tf.random.normal([])\n",
        "\n",
        "n1=return_one_noise(1)\n",
        "n2=return_one_noise(2) \n",
        "assert n1 == n2\n",
        "print(n1.numpy(), n2.numpy())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6vmWv0ALKvqh"
      },
      "source": [
        "In TFF, things are slightly different. If we wrap the noise generation as `tff.tf_computation` instead of `tf.function`, non-deterministic random noise will be generated. However, if we run this code snippet multiple times, different set of `(n1, n2)` will be generated each time. There is no easy way to set a global random seed for TFF."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "D_5T0UzHKtde"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "1.3283143 0.45740178\n"
          ]
        }
      ],
      "source": [
        "tf.random.set_seed(1)\n",
        " \n",
        "@tff.tf_computation\n",
        "def return_one_noise(_):\n",
        "  return tf.random.normal([])\n",
        "\n",
        "n1=return_one_noise(1)\n",
        "n2=return_one_noise(2) \n",
        "assert n1 != n2\n",
        "print(n1, n2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GJMdUjhxWPcR"
      },
      "source": [
        "Moreover, deterministic noise can be generated in TFF without explicitly setting a seed. The function `return_two_noise` in the following code snippet returns two identical noise values. This is expected behavior because TFF will build computation graph in advance before execution. However, this suggests users have to pay attention on the usage of `tf.random.normal` in TFF."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "k0jtUXzCSTCN"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "-0.15665223 -0.15665223\n"
          ]
        }
      ],
      "source": [
        "@tff.tf_computation\n",
        "def tff_return_one_noise():\n",
        "  return tf.random.normal([])\n",
        "\n",
        "@tff.federated_computation\n",
        "def return_two_noise():\n",
        "  return (tff_return_one_noise(), tff_return_one_noise())\n",
        "\n",
        "n1, n2=return_two_noise() \n",
        "assert n1 == n2\n",
        "print(n1, n2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Wk0UhmhuYtr8"
      },
      "source": [
        "### Usage with care: `tf.random.Generator`\n",
        "\n",
        "We can use `tf.random.Generator` as suggested in the [TF tutorial](https://www.tensorflow.org/guide/random_numbers). "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SuYiH7n5ZTej"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "0.3052047 -0.38260338\n"
          ]
        }
      ],
      "source": [
        "@tff.tf_computation\n",
        "def tff_return_one_noise(i):\n",
        "  g=tf.random.Generator.from_seed(i)\n",
        "  @tf.function\n",
        "  def tf_return_one_noise():\n",
        "    return g.normal([])\n",
        "  return tf_return_one_noise()\n",
        "\n",
        "@tff.federated_computation\n",
        "def return_two_noise():\n",
        "  return (tff_return_one_noise(1), tff_return_one_noise(2))\n",
        "\n",
        "n1, n2 = return_two_noise() \n",
        "assert n1 != n2\n",
        "print(n1, n2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HU8tKbmvqN_w"
      },
      "source": [
        "However, users may have to be careful on its usage\n",
        "\n",
        "\n",
        "*   `tf.random.Generator` uses `tf.Variable` to maintain the states for RNG algorithms. In TFF, it is recommended to contruct the generator inside a `tff.tf_computation`; and it is difficult to pass the generator and its state between `tff.tf_computation` functions.\n",
        "*   the previous code snippet also relies on carefully setting seeds in generators. We will get expected but surprising results (deterministic `n1==n2`) if we use `tf.random.Generator.from_non_deterministic_state()` instead. \n",
        "\n",
        "In general, TFF prefers functional operations and we will showcase the usage of `tf.random.stateless_*` functions in the following sections."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pImReFSuIaCq"
      },
      "source": [
        "In TFF for federated learning, we often work with nested structures instead of scalars and the previous code snippet can be naturally extended to nested structures."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "B45urU98Fb8U"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "n1 [array([[0.3052047 , 0.5671378 ],\n",
            "       [0.41852272, 0.2326421 ]], dtype=float32), array([1.1675092], dtype=float32)]\n",
            "n2 [array([[-0.38260338, -0.47804865],\n",
            "       [-0.5187485 , -1.8471988 ]], dtype=float32), array([-0.77835274], dtype=float32)]\n"
          ]
        }
      ],
      "source": [
        "@tff.tf_computation\n",
        "def tff_return_one_noise(i):\n",
        "  g=tf.random.Generator.from_seed(i)\n",
        "  weights = [\n",
        "         tf.ones([2, 2], dtype=tf.float32),\n",
        "         tf.constant([2], dtype=tf.float32)\n",
        "     ]\n",
        "  @tf.function\n",
        "  def tf_return_one_noise():\n",
        "    return tf.nest.map_structure(lambda x: g.normal(tf.shape(x)), weights)\n",
        "  return tf_return_one_noise()\n",
        "\n",
        "@tff.federated_computation\n",
        "def return_two_noise():\n",
        "  return (tff_return_one_noise(1), tff_return_one_noise(2))\n",
        "\n",
        "n1, n2 = return_two_noise() \n",
        "assert n1[1] != n2[1]\n",
        "print('n1', n1)\n",
        "print('n2', n2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aMLslYfa76cm"
      },
      "source": [
        "### Recommended usage: `tf.random.stateless_*` with a helper\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TnyhlV0fIxYR"
      },
      "source": [
        "A general recommendation in TFF is to use the functional `tf.random.stateless_*` functions for random noise generation. These functions take `seed` as an explicit input argument to generate random noise. We first define a helper class to maintain the seed as pseudo state. The helper `RandomSeedGenerator` has functional operators in a state-in-state-out fashion. It is reasonable to use a counter as pseudo state for `tf.random.stateless_*` as these functions scramble the seed before using it to make noises generated by correlated seeds statistically uncorrelated."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NF1gaMgrKdwU"
      },
      "outputs": [],
      "source": [
        "class RandomSeedGenerator():\n",
        "\n",
        "  def initialize(self, seed=None):\n",
        "    if seed is None:\n",
        "      return tf.cast(tf.stack(\n",
        "          [tf.math.floor(tf.timestamp()*1e6),\n",
        "            tf.math.floor(tf.math.log(tf.timestamp()*1e6))]), dtype=tf.int64)\n",
        "    else:\n",
        "      return tf.constant(self.seed, dtype=tf.int64, shape=(2,))\n",
        "\n",
        "  def next(self, state):\n",
        "    return state + 1\n",
        "\n",
        "  def structure_next(self, state, nest_structure):\n",
        "    \"Returns seed in nexted structure and the next state seed.\"\n",
        "    flat_structure = tf.nest.flatten(nest_structure)\n",
        "    flat_seeds = [state+i for i in range(len(flat_structure))]\n",
        "    nest_seeds = tf.nest.pack_sequence_as(nest_structure, flat_seeds)\n",
        "    return nest_seeds, flat_seeds[-1] + 1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "x9kc6G0RIATV"
      },
      "source": [
        "Now let us use the helper class and `tf.random.stateless_normal` to generate (nested structure of) random noise in TFF. The following code snippet looks a lot like a TFF iterative process, see [simple_fedavg](https://github.com/tensorflow/federated/blob/master/tensorflow_federated/python/examples/simple_fedavg/simple_fedavg_tff.py) as an example of expressing federated learning algorithm as TFF iterative process. The pseudo seed state here for random noise generation is `tf.Tensor` that can be easily transported in TFF and TF functions."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2dZn7LtjL_hk"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "n1 [array([[-0.21598858, -0.30700883],\n",
            "       [ 0.7562299 , -0.21218438]], dtype=float32), array([-1.0359321], dtype=float32)]\n",
            "n2 [array([[ 1.0722181 ,  0.81287116],\n",
            "       [-0.7140338 ,  0.5896157 ]], dtype=float32), array([0.44190162], dtype=float32)]\n"
          ]
        }
      ],
      "source": [
        "@tff.tf_computation\n",
        "def tff_return_one_noise(seed_state):\n",
        "  g=RandomSeedGenerator()\n",
        "  weights = [\n",
        "         tf.ones([2, 2], dtype=tf.float32),\n",
        "         tf.constant([2], dtype=tf.float32)\n",
        "     ]\n",
        "  @tf.function\n",
        "  def tf_return_one_noise():\n",
        "    nest_seeds, updated_state = g.structure_next(seed_state, weights)\n",
        "    nest_noise = tf.nest.map_structure(lambda x,s: tf.random.stateless_normal(\n",
        "        shape=tf.shape(x), seed=s), weights, nest_seeds)\n",
        "    return nest_noise, updated_state\n",
        "  return tf_return_one_noise()\n",
        "\n",
        "@tff.tf_computation\n",
        "def tff_init_state():\n",
        "  g=RandomSeedGenerator()\n",
        "  return g.initialize()\n",
        "\n",
        "@tff.federated_computation\n",
        "def return_two_noise():\n",
        "  seed_state = tff_init_state()\n",
        "  n1, seed_state = tff_return_one_noise(seed_state)\n",
        "  n2, seed_state = tff_return_one_noise(seed_state)\n",
        "  return (n1, n2)\n",
        "\n",
        "n1, n2 = return_two_noise() \n",
        "assert n1[1] != n2[1]\n",
        "print('n1', n1)\n",
        "print('n2', n2)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "random_noise_generation.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
